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Abstract 

Background: Most previous studies of morphological and molecular data have consistently supported the 
monophyly of the true water bugs (Hemiptera: Nepomorpha). An exception is a recent study by Hua et al. 
(BMC Evol Biol 9: 134, 2009) based on nine nepomorphan mitochondrial genomes. In the analysis of Hua et al. 
(BMC Evol Biol 9: 134, 2009), the water bugs in the group Pleoidea formed the sister group to a clade that 
consisted of Nepomorpha (the remaining true water bugs) + Leptopodomorpha (shore bugs) + Cimicomorpha 
(assassin bugs and relatives) + Pentatomomorpha (stink bugs and relatives), thereby suggesting that fully aquatic 
hemipterans evolved independently at least twice. Based on these results, Hua et al. (BMC Evol Biol 9: 134, 2009) 
elevated the Pleoidea to a new infraorder, the Plemorpha. 

Results: Our reanalysis suggests that the lack of support for the monophyly of the true water bugs (including 
Pleoidea) by Hua et al. (BMC Evol Biol 9: 134, 2009) likely resulted from inadequate taxon sampling. In particular, 
long-branch attraction (LBA) between the distant outgroup taxa and Pleoidea, as well as LBA among taxa in the 
ingroup, made Nepomorpha appear to be polyphyletic. We used three complementary strategies to test and 
alleviate the effects of LBA: (1) the removal of distant outgroups from the analysis; (2) the addition of closely related 
outgroups; and (3) the addition of a mitochondrial genome from a second family of Pleoidea. We also performed 
likelihood-ratio tests to examine the support for monophyly of Nepomorpha with different combinations of taxa 
included in the analysis. Furthermore, we found that specimens of Helotrephes sp. were misidentified as Poraplea 
frontalis (Fieber, 1844) by Hua et al. (BMC Evol Biol 9: 134, 2009). 

Conclusions: All analyses that included the addition of more taxa significantly and consistently supported the 
placement of Pleoidea within the Nepomorpha (i.e., supported the monophyly of the traditional true water bugs). 
Our analyses further support a close relationship between Notonectoidea and Pleoidea within Nepomorpha, and 
the superfamilies Nepoidea, Ochteroidea, Naucoroidea, and Pleoidea are resolved as monophyletic in all trees with 
strong support. Our results also confirmed that monophyly of Nepomorpha clearly is not refuted by the 
mitochondrial genome data. 
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Background 

Long-branch attraction (LBA) is a bias that results in 
spurious support for relationships between two (or more) 
long branches in an estimated phylogenetic tree when the 
assumed model of evolution is too simplistic [1,2]. Biases 
associated with LBA have been identified in many phylo- 
genetic studies, including analyses of mammals [3,4], birds 
[5], arthropods [6-8], and seed plants [9,10]. The most 
common problem occurs when distantly related ingroup 
taxa are poorly sampled and one or a few distant outgroup 
taxa are included to root the tree. Under these conditions, 
a simplistic model of evolution is unlikely to sufficiently 
account for homoplasy, and long branches will be con- 
nected (or attracted to one another) in the inferred tree 
based on homoplastic similarities [11]. One method for 
detecting this problem involves conducting phylogenetic 
analyses with and without outgroups [12]. If the inclusion 
of a distant outgroup changes the inferred relationships of 
the ingroup, it may be better to infer ingroup relationships 
separately and consider other methods for rooting the 
resulting tree, or to use more closely related outgroups 
[13]. In addition, several strategies have been suggested 
to reduce the effects of LBA, including: (1) excluding 
long-branch taxa from the analysis, (2) replacing the 
long-branch taxa with slow-evolving close relatives, (3) 
removing fast-evolving proteins or sites, (4) improving 
the models of character evolution assumed in the ana- 
lysis, and (5) sampling more taxa to break up long 
branches in the tree [14-16]. Among these methods, 
adding taxa to break up long branches is one of the 
most widely suggested strategies to reduce the effects of 
LBA bias [17,18]. Appropriate and thorough taxon sam- 
pling is thus one of the most important considerations 
for accurate phylogenetic estimation [16-19]. Phylogenetic 



analyses based on relatively few distantly related taxa 
(but with each taxon represented by many characters, 
such as from a mitochondrial genome) are particularly 
prone to problems with LBA; such analyses are likely to 
produce high support values for incorrect phylogenetic 
relationships [16,20]. 

The relationships of the true water bugs (Hemiptera: 
Nepomorpha) within heteropteran insects [21] have been 
the subject of many studies of molecular and morphological 
data. The monophyly of Nepomorpha has been consistently 
and strongly supported by studies based on morphological 
characters [22-25], molecular data (partial sequences of 16S 
rDNA and 28S rDNA [26], and four Hox genes [27]), and 
by combined data analyses [26]. In contrast, the monophyly 
of Nepomorpha has only been disputed in the study of Hua 
et al. [28], who based their analysis on nine nepomorphan 
mitochondrial genomes (mt-genomes). In the study by Hua 
et al. [28], Pleoidea was not supported as part of Nepomor- 
pha, but instead was resolved as the sister-group of a clade 
that included the remaining species of Nepomorpha plus 
Leptopodomorpha, Cimicomorpha, and Pentatomomorpha 
(Figure 1). As a result of these analyses, Hua et al. [28] 
suggested that Pleoidea should be raised from a super- 
family within Nepomorpha to the infraorder Plemorpha, 
outside of Nepomorpha. Their conclusions were sup- 
ported by high Bayesian posterior probabilities (BPP) 
and maximum likelihood (ML) bootstrap proportions in 
five of eight phylogenetic analyses. 

The study by Hua et al. [28] has both strengths and 
weaknesses when compared with previous studies of the 
phylogenetic relationships of Nepomorpha. Each taxon 
sampled by Hua et al. [28] was sampled for complete 
mitochondrial genomes, so the number of characters 
available for phylogenetic inference was large. In contrast, 
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Figure 1 The consensus phylogeny based on the data sets analyzed by Hua et al. [28]. Five of the eight phylogenetic analyses they 
conducted supported this tree. Numbers at the nodes indicate the BPP and ML support values for each data matrix analyzed by Hua et al. [28] in 
the following order: PP and BP for PCG123RT, PP and BP for PCG12RT, and PP for PCG12. Branch lengths are similar across analyses; these branch 
lengths represent the analysis of the PCG123RT data set. The scale bar represents the number of expected substitutions per site. 
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previous studies [22-27] examined fewer characters per 
taxon, but included more taxa in the analyses. Thorough 
taxon sampling can often lead to more accurate phylogen- 
etic inference, even if the total number of characters in 
the analysis is decreased [29-32] . In particular, the position 
of Pleoidea in the study of Hua et al [28] may have been 
affected by the inclusion of just one of two families in 
Pleoidea (Helotrephidae, without any representation of 
Pleidae; see Results and discussion). This made it more 
likely for the tree to be rooted by connection of the dis- 
tantly related outgroup taxa to the long branch leading to 
Helotrephes sp. (Figure 1). 

A second consideration is the selection of outgroups 
used by Hua et al. [28]. Fulgoromorpha is very distantly 
related to the ingroup Nepomorpha, making problems 
associated with LBA more likely [30,33]. Furthermore, in 
groups more closely related to Nepomorpha, Hua et al. 
[28] sampled only one representative for three different 
infraorders (Cimicomorpha, Leptopodomorpha and Pen- 
tatomomorpha). Thus, we examined the possibility that 
the findings of Hua et al. [28] resulted from biases asso- 
ciated with inadequate taxon sampling. Because the 
model-based methods used by Hua et al. [28] are less 
sensitive to the problems of LBA [34-36], these authors 
did not consider LBA to be a likely explanation of their 
results. However, models of evolution are never perfect, 
and poor taxon sampling exacerbates the problems of 
model insufficiency, so the use of model-based inference 
methods is not, by itself, a panacea for dealing with 
biases associated with LBA [11,16]. 

We undertook the current study to explore the con- 
clusion of Hua et al. [28] that the Pleoidea evolved their 
fully aquatic lifestyle independently of the remaining 
true water bugs in Nepomorpha. Our hypothesis was 
that this conclusion was a result of LBA between the 
single sampled representative of Pleoidea and the dis- 
tantly related outgroup, Fulgoromorpha. We tested this 
hypothesis by: (1) removing the outgroups and re- 
estimating the phylogeny of Nepomorpha only, to detect 
whether the ingroup topology is affected by the long- 
branch outgroup taxa [12,13]; (2) increasing taxon sampling 
of groups related to Nepomorpha, including Leptopodo- 
morpha, Cimicomorpha, and Pentatomomorpha [37]; 
and (3) adding new mt-genome data for a representative 
of the second family within Pleoidea, namely Pleidae 
(the presumed sister-group of Helotrephidae). 

Results and discussion 

Misidentification of previously sampled taxa 

To test our hypothesis that the conclusion of Hua et al. 
[28] (Pleidae outside of the remaining Nepomorpha) was 
an artifact of limited taxon sampling, we sampled a mem- 
ber of the family Helotrephidae. Helotrephidae is generally 
accepted as the sister-group of Pleidae [22,23,25,26], so we 



reasoned that including the sister-group of Pleidae was 
the best way to break up the long terminal branch leading 
to this taxon. We sequenced the mt-genome of Helo- 
trephes semiglobosus semiglobosus Stal, 1860 (Nepomor- 
pha: Helotrephidae). However, after we obtained a partial 
mt-genome sequence of Helotrephes semiglobosus semiglo- 
bosus (GenBank accession number: KJ027513) with the 
length of 8,876 bp, including 29 genes (two rRNAs, ten 
protein coding genes [PCGs] and 17 tRNAs) as well as the 
control region, we found extreme similarity (97.4%) be- 
tween this species and the specimen previously identified 
by Hua et al. [28] as Paraplea frontalis (Fieber, 1844). As 
this level of sequence similarity was unexpected between 
species in these two families, we checked the specimens 
identified previously as Paraplea frontalis by Hua et al. 
[28]. We found that those specimens are properly identi- 
fied as Helotrephes sp., and so represent a species in Helo- 
trephidae rather than Pleidae. As the mt-genome of a 
species in Helotrephidae was already represented in the 
data set, we then sequenced a new mt-genome of Para- 
plea frontalis, as a true representative of Pleidae. Hence- 
forth, we label the sample sequenced by Hua et al. [28] 
correctly as Helotrephes sp.. 

Removal of outgroups from the analysis 

The most common problem of LBA is that distantly re- 
lated outgroups have a biased attraction to long branches 
within the ingroup [3,4,38]. For this reason, a common 
suggestion is to conduct phylogenetic analyses both with 
and without the outgroups to compare whether the dis- 
tantly related outgroup alters the ingroup topology [16]. 
To test if outgroup selection affected the topology of our 
ingroup, we ran analyses using only the ingroup taxa of 
Hua et al. [28]. Using Bayesian and ML analyses, all data 
matrices of Hua et al. [28] generated phylogenetic trees 
with the same topology (Figure 2). When the outgroups 
are removed, the ingroup topology is distinct from that 
obtained by Hua et al. [28] (Figure 1). In all of these ana- 
lyses, Helotrephes sp. was connected to Enithares tibialis 
Liu et Zheng, 1991 (Nepomorpha: Notonectoidea). 

Addition of outgroups 

Outgroup selection is an important factor for reconstruct- 
ing phylogenetic trees, because the choice of outgroup 
taxa can affect the ingroup topology [39]. However, out- 
group selection is often not adequately considered [40,41]. 
Moreover, several authors have pointed out that adding 
more outgroup taxa in the sister-group to a phylogenetic 
analysis can improve the accuracy of phylogenetic estima- 
tion, and also should help break up the LBA between any 
long-branch members of the ingroup and the outgroup 
[38,42,43]. Therefore, we added three more taxa (selected 
from the sister-group of Nepomorpha) to the dataset of 
Hua et al. [28]. 
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Figure 2 Phylogenetic results based on analyses of ingroup taxa only. Numbers at the nodes are BPP and ML support values in the 
following order: PP and BP for PCG12, PP and BP for PCG123, PP and BP for PCG12RT, and PP and BP for PCG123RT.The red dot on the tree 
indicates the clade of Notonectoidea + Pleoidea. The scale bar represents the number of expected substitutions per site based on analysis of the 
PCG12 data set. 



Both Bayesian inference and ML analyses resulted in 
the same topology (Figure 3A); the position of the long 
branch of Helotrephes sp. (Nepomorpha: Pleoidea) was 
supported within Nepomorpha rather than outside of 
Nepomorpha, in contrast to the findings of Hua et al. 
[28]. The monophyly of Nepomorpha (including both 
Helotrephidae and Pleidae) received strong support in 
Bayesian analyses (based on posterior probabilities: PP) 
but with relatively weak support in ML analyses (based 
on bootstrap proportions: BP). The monophyletic Nepoi- 
dea, Ochteroidea, and Naucoroidea were strongly sup- 
ported by both PP and BP, similar to the results of Hua 
et al. [28]. Additionally, the topology of the infraordinal 



relationships of Heteroptera is similar to previous work 
[44] also based on mt-genomes, namely (Gerromorpha + 
(Pentatomomorpha + (Leptopodomorpha + (Cimicomor- 
pha + Nepomorpha)))). 

We also estimated phylogenetic trees without the 
long-branched outgroup of Lycorma delicatula (White, 
1845) (Hemiptera: Auchenorrhyncha: Fulgoromorpha). 
The major changes that resulted from deletion of this 
taxon were the position of Helotrephes sp. and Naucoroi- 
dea (Figure 3B). In both Bayesian and ML analyses, Helo- 
trephes sp. (Nepomorpha: Pleoidea) was supported as the 
sister group of Enithares tibialis (Nepomorpha: Notonec- 
toidea). The close relationship between the Notonectoidea 
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Figure 3 Phylogenetic trees based on the inclusion of additional closely related outgroups. (A) Analysis including the distant outgroup 
Lycormo delicotulo (Hemiptera: Auchenorrhyncha: Fulgoromorpha). (B) Analysis excluding the distant outgroup Lycormo delicatula. Numbers at the 
nodes are BPP (left) and ML support values (right). Yellow dots on each phylogram indicate the clades of Nepomorpha, and red dot indicate the 
clades of Notonectoidea + Pleoidea. Asterisks indicate these additional closely related outgroups. The scale bar represents the number of 
expected substitutions per site. 
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and Pleoidea also has been supported in most previous 
studies [22-26]. Although the relationships among fam- 
ilies of Nepomorpha varied among trees, all the analyses 
that excluded Fulgoromorpha supported the monophyly 
of Nepomorpha (including Helotrephidae as well as 
Pleidae, when the latter was added to the analyses). 
These analyses demonstrate that the conclusions of Hua 
et al. [28] were at least partly a result of their use of a 
very distant outgroup. 

Addition of a new mitochondrial genome of Pleidae 

We sequenced and assembled a new mt-genome for Para- 
plea frontalis (Fieber, 1844), except for small portions of 
12S rRNA gene and the control region (polynucleotide se- 
quences in these two regions proved difficult to resolve 
with certainty). This mt-genome was 14,143 bp in length 
and has been deposited in the GenBank (Accession num- 
ber: KJ027516). The mt-genome of Paraplea frontalis con- 
tained the typical 37 genes (two rRNAs, 13 PCGs and 22 
tRNAs), with the same gene order as observed in most 
other true bugs [44,45] (Table 1). Gene overlaps were 
found at 11 gene junctions and involved a total of 32 bp, 
which may make the genome relatively compact. Twelve 
of the 13 PCGs initiated with ATN as start codon, whereas 
the COI gene started with TTG. Eight PCGs ended with 
the termination codon TAA and one with TAG, whereas 
the remaining four were terminated with T. All of the 22 
typical animal tRNA genes were observed in the Paraplea 
frontalis mt-genome, ranging from 63 to 74 bp. Most of 
the tRNAs could be folded into typical cloverleaf second- 
ary structures, except that the stem of the dihydrouridine 
(DHU) arm simply formed a loop in tRNA-Ser (GCT) 
(see Additional file 1). There are 22 unmatched base pairs 
in the Paraplea frontalis mitochondrial tRNA secondary 
structures. 

Increased taxon sampling, especially when it breaks up 
long branches in a tree, is the most effective strategy for 
reducing the effects of LBA [16,31,32]. We added the 
representative of Pleidae, which is thought to be the 
sister-group of Helotrophidae, to help reduce the length 
of the branch that led to the single sampled species of 
Helotrephidae sampled by Hua et al. [28]. We therefore 
added our mt-genome of Paraplea frontalis to the four 
data matrices of Hua et al. [28] and conducted new 
phylogenetic analyses (Figure 4). 

As with our analyses that replaced the distant outgroup 
with more appropriate outgroups, the analyses that in- 
cluded a member of Pleidae supported monophyly of 
Nepomorpha (with strong PP support but weak BP sup- 
port). Moreover, these analyses strongly supported Para- 
plea frontalis (Pleidae) as the sister group of Helotrephes sp. 
(Helotrephidae). Together, Pleidae and Helotrephidae were 
supported as the sister-group of Notonectidae. The mono- 
phyletic groups of Nepoidea, Ochteroidea, Naucoroidea, 



Pleoidea, and Notonectoidea + Pleoidea were strongly 
supported by both PP and BP in all analyses that in- 
cluded Pleidae. 

Likelihood-ratio tests 

We compared the likelihood ratios of the best solutions 
for each of our two alternative hypotheses (Pleoidea inside 
versus outside of Nepomorpha; see Additional file 2) for 
eight different combinations of taxa (Table 2). The mono- 
phyly of Nepomorpha (including Pleoidea) was strongly 
supported if we added Paraplea frontalis and/or three 
more outgroup taxa to the original data matrix of Hua 
et al. [28], as well as when we analyzed the data set with- 
out the distant outgroup consisting of Lycorma delicatula. 
The original conclusion of Hua et al. [28] (the polyphyly 
of true water bugs) was only supported with the specific 
combination of taxa analyzed in the original study. Even 
then, the likelihood-ratio support for this result over the 
alternative is weak (Table 2). 

Phylogeny of nepomorpha 

Given that the monophyly of Nepomorpha is consistently 
supported in all of our new analyses, we find no support 
for the new infraorder Plemorpha. Therefore, we recom- 
mend retaining Pleoidea as part of Nepomorpha. The 
superfamilies of Nepoidea (Belostomatidae + Nepidae), 
Ochteroidea (Gelastocoridae + Ochteridae), Naucoroidea 
(Aphelocheiridae + Naucoridae), and Pleoidea (Pleidae + 
Helotrephidae) are monophyletic groups in all our ana- 
lyses with high support from both PP and BP. We also 
found strong support for the close relationship between 
Notonectoidea and Pleoidea. Several synapomorphies of 
biological and ecological traits also support some of these 
monophyletic groups [24-26,46] : 

Nepomorpha: the short antennae are concealed below 
the eyes; all have an aquatic lifestyle, although 
Ochteroidea (including Ochteridae and Gelastocoridae) 
live along freshwater shores rather than underwater; 
Nepoidea (including Nepidae and Belostomatidae): 
air-breathing through a siphon; 

Naucoroidea: all Aphelocheiridae and some Naucoridae 
use plastron respiration; 

Pleoidea (including Pleidae and Helotrephidae): also 
have plastron respiration, which allows them to stay 
permanently submerged; 

Notonectoidea and Pleoidea (including Notonectidae, 
Pleidae, and Helotrephidae): swim on their backs in an 
inverted position. 

Our principal goal in this study was to discuss the 
monophyly of Nepomorpha and the effects of adequate 
taxon sampling on this phylogenetic problem. As we did 
not sample all the families of Nepomorpha, a more 
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Table 1 Organization of the Paraplea frontalis mitochondrial genome 
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N 


9718-9782 


TGG 65 






0 


ND6 


J 


9785-10285 


501 


ATT 


TAA 


2 


CytB 


J 


10285-11421 


1137 


ATG 


TAG 


-1 


tRNA-Ser 


J 


11420-11487 


TGA 68 






-2 


ND1 


N 


11504-12421 


918 


A^ 


TAA 


16 


tRNA-Leu 


N 


12422-12485 


TAG 64 






0 


16S rRNA 


N 


12486-13757 


1272 






0 


tRNA-Val 


N 


13758-13827 


TAC 70 






0 


12S rRNA 


N 


13828-14143 


316 






0 



a Numbers correspond to nucleotides separating a gene from an upstream one; negative numbers indicate that adjacent genes overlap. 



thorough sampling of taxa is needed to adequately re- 
solve the family relationships within Nepomorpha. In 
particular, more sampling of Potamocoridae, Microneo 
tidae and Diaprepocoridae (Hemiptera: Nepomorpha) 
mt-genome sequences will be needed for a thorough 
analysis of the major groups within Nepomorpha. 



Conclusions 

This study provides a clear example of the importance 
of adequate sampling. We support the conclusion that 
investigators should be cautious about making major 
taxonomic rearrangements on the basis of limited taxon 
sampling, even (or especially) when the number of characters 
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Add new mt-genome of Paraplea frontalis 

A) Bayesian Inference excluding RT 



1/1 



-Lycorma delicatula 
-Hydrometra greeni 



-Aquarius paludum 
• Leptopus sp. 



PCG12/PCG123 



1/0.91 



0.98/1 



1/1 



- Stictopleurus subviridis 
— Valentia hoffmanni 
— Sigara septemlineata 

— Diplonychus rusticus 
— Laccotrephes robustus 
Nerthra indica 



-Ochterus marginatus 



0.82/0.77 

Aphelocheirus ellipsoideus 

—llyocoris cimicoides 



1/1 



1/1 

4T7T" 



-Enithares tibialis 
— Paraplea frontalis 
Helotrephes sp. 



Outgroup 

Corixoidea ^ > 
Nepoidea § 

Ochteroidea ' 

Naucoroidea 
Notonectoidea \ 



Pleoidea 



§ 



B) ML excluding PCG12RT 



-Lycorma delicatula 
-Hydrometra greeni 



- \ 59/61/62 



Aquarius paludum 
—Leptopus sp. 



39/30/51 



60/39/38 



PCG 1 2/PCG 1 23/PCG 1 23RT 



32/47/38 
H 87/85/92 



4/79/9Q 



83/87/99 



— Stictopleurus subviridis 

Valentia hoffmanni 

Sigara septemlineata 

Diplonychus rusticus 

59/100 

Laccotrephes robustus 

- Nerthra indica 

9 

Ochterus marginatus 

■Aphelocheirus ellipsoideus 
— llyocoris cimicoides 
- Enithares tibialis 
Paraplea frontalis 
Helotrephes sp. 



Outgroup 

Corixoidea <r> 
Nepoidea f 

Ochteroidea J^- ' 

Naucoroidea 
Notonectoidea 
Pleoidea 



)idea x >4t N 



0& 



C) Combined BI and ML including RT 

Lycorma delicatula 

_r— — Hydrometra greeni 



0.83/49/1 



0.99/60/0.74 



-Aquarius paludum 
Leptopus sp. 



H 1/70/1 

0 81/33/0 83 stictopleurus subviridis 

Valentia hoffmanni 

Sigara septemlineata 

Diplonychus rusticus 
Laccotrephes robustus 
—Nerthra indica 



i-E 



PCG12RT (BI)/PCG12RT (ML) 
/PCG123RT (BI) 



1 — 1 1/99/1 



—Ochterus marginatus 
Aphelocheirus ellipsoideus 
-llyocoris cimicoides 
-Enithares tibialis 
Paraplea frontalis 
Helotrephes sp. 



Outgroup 



Corixoidea 

Nepoidea 
Ochteroidea 

Naucoroidea 
Notonectoidea 
Pleoidea 



)idea^L 



HI 



Figure 4 Phylogenetic trees based on the addition of a new mitochondrial genome of Paraplea frontalis (Nepomorpha: Pleoidea). With 
adding the new mt-genome of Paraplea frontalis (Fieber, 1844) to the data matrices of Hua et al. [28], we gathered four new data matrices of 16 
(PCG12), 16(PCG123), 16(PCG12RD, and 16(PCG123RD. (A) Numbers at the nodes are BPP for the data matrix of 16(PCG12) (left) and 16(PCG123) 
(right). (B) Numbers at the nodes are ML support values for the data matrix of 16(PCG12) (left), 16(PCG123) (middle), and 16(PCG123RD (right). 
(C) Numbers at the nodes are BPP for 16(PCG12RD (left), ML support values for 16(PCG12RD (middle), and BPP for 16(PCG123RT) (right). Yellow 
dots on each phylogram indicate the clades of Nepomorpha, and Red dots indicate the clades of Notonectoidea + Pleoidea. The scale bar 
represents the number of expected substitutions per site. 
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Table 2 Likelihood-ratio tests for monophyly of Nepomorpha with eight different combinations of taxa 



Taxa added 
to analysis of 
Hua et al. [28] 


Taxa deleted 
from analysis 
of Hua et al. [28] 


In L (Hypothesis 1) 


In L (Hypothesis 2) 


2AL 


Hypothesis 1° 
(Helotrephidae 
within Nepomorpha) 


Hypothesis 2 a 
(Helotrephidae 
outside Nepomorpha) 


None 


None 


-68913.45 


-68909.21 


8.48 




Weak 


None 


Lycorma delicatula 


-63056.47 


-63052.68 


7.58 




Weak 


Pa rap lea frontalis 


None 


-72517.55 


-72552.52 


-69.94 


Very strong 




Para plea frontalis 


Lycorma delicatula 


-66486.5 


-66522.49 


-71.98 


Very strong 




Triatoma dimidiata 


None 


-80054.05 


-80074.67 


-41.24 


Very strong 




Yemmalysus parallelus 














Saldula arsenjevi 














Triatoma dimidiata 


Lycorma delicatula 


-74196.98 


-74230.9 


-67.84 


Very strong 




Yemmalysus parallelus 














Saldula arsenjevi 














Paraplea frontalis 


None 


-83521.55 


-83612.5 


-181.9 


Very strong 




Triatoma dimidiata 














Yemmalysus parallelus 














Saldula arsenjevi 














Paraplea frontalis 


Lycorma delicatula 


-77682.65 


-77774.22 


-183.14 


Very strong 




Triatoma dimidiata 














Yemmalysus parallelus 














Saldula arsenjevi 















a: 2AL scores with an absolute value of 0 to 10 indicate weak support, >10 to 30 indicate strong support, and >30 indicate very strong support for the favored 
hypothesis [48]. Scores are calculated so that positive values indicate support for Hypothesis 2, and negative values indicate support for Hypothesis 1. 



sampled per taxon is large [16,17,3132]. Phylogenetic ana- 
lyses that are based on even complete genomes of relatively 
few taxa are likely to result in strongly supported, but incor- 
rect, evolutionary reconstructions [16,17,47]. In the study by 
Hua et al. [28], limited sampling of mt-genomes, coupled 
with the use of a distant outgroup, resulted in a conclusion 
that was at odds with a traditionally supported group (true 
water bugs, or Neopmorpha). But even minimal additional 
sampling to break up long branches in the tree, or the use of 
more closely related outgroups, results in trees in which the 
traditional group Nepomorpha is supported. 

In the phylogenomic era [48], many papers are report- 
ing surprising phylogenetic results that conflict with 
traditional hypotheses of relationships. Many (or even 
most) of these surprising results are based on analyses of 
many characters (even whole genomes) from very few 
taxa [16,47,49]. Strong "statistical support" for a given 
conclusion may come from strong underlying phylogen- 
etic signal, but also from systematic bias that stems from 
assuming inadequate or inappropriate models of evolu- 
tion [50]. Using large numbers of characters in a phylo- 
genetic analysis means that even small systematic biases 
associated with overly simplistic methodological as- 
sumptions are likely to be mistaken as strong phylogen- 
etic signal. Thorough taxon sampling allows the use of 
more simplistic models of evolution, because multiple 
changes at each nucleotide site can be appropriately 



reconstructed through the increased sampling of the tree 
[18]. If the sampling in a phylogenomic study is sparse, 
investigators should use appropriate caution before over- 
turning analyses that are based on more thorough sam- 
pling of taxa. 

Methods 

Ethics statement 

No specific permits were required for the insect col- 
lected for this study in Yunnan and Hubei Province, 
China. The insect specimens were collected with a 
sturdy aquatic net at the pond. The field studies did not 
involve endangered or protected species. The species in 
the genus of Paraplea and Helotrephes are common 
small insects and are not included in the "List of Pro- 
tected Animals in China". 

Specimen collection 

Adult specimens of Paraplea frontalis were collected 
from Tongbiguan Village (24°36.411 N, 97°39.349E), 
Yingjiang County, Dehong City, Yunnan Province, 
China, on May 18th, 2009. Adult specimens of Helo- 
trephes semiglobosus semiglobosus were collected from 
Jin Ji Valley (29°22.339 N, 114°34.301E), Jiu Gong Shan, 
Tong Shan County, Hubei Province, China, on July 30th, 
2010. Voucher specimens are deposited in the Insect 
Molecular Systematics Lab, Institute of Entomology, 
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College of Life Sciences, Nankai University, Tianjin, 
China. All specimens were initially preserved in 95% 
ethanol in the field. After being transferred to the la- 
boratory, they were stored at -20°C until used for DNA 
extraction. 



PCR amplification and sequencing 

Whole genomic DNA was extracted from thoracic muscle 
tissue by CTAB-based method [51]. The mt-genome of 
Paraplea frontalis was amplified in four overlapping PCR 
fragments by PCR amplification (see Additional file 3). 
The partial mt-genome of Helotrephes semiglobosus semi- 
globosus was sequenced with two fragments (see Additional 
file 4). Primer pairs were modified from previous work 
[28], and designed from sequenced fragments. 

PCR reactions were performed with TaKaRa LA Taq 
under the following conditions: 1 min initial denatur- 
ation at 94°C, followed by 30 cycles of 20 s at 94°C, 
1 min at 50°C, and 2-8 min at 68°C, and a final elong- 
ation for 10 min at 72°C. PCR products were electropho- 
resed in 1% agarose gel, purified, and then sequenced 
using an ABI 3730XL capillary sequencer with the Big- 
Dye Terminator Sequencing Kit (Applied Bio Systems). 
All fragments were sequenced with primer walking on 
both strands. 



Sequence analysis and annotation 

Sequence files were assembled into contigs using BioEdit 
version 7.0.5.2 [52]. Protein coding regions were deter- 
mined via ORF Finder implemented at the NCBI website 
(http://www.ncbi.nlm.nih.gov/gorf/gorf.html) with inver- 
tebrate mitochondrial genetic codes. Transfer RNA ana- 
lysis was performed by tRNAscan-SE version 1.21 [53] 
with the invertebrate mitochondrial codon predictors 
and a cove score cut-off of 5. Few tRNA genes that 
could not be identified by tRNAscan-SE were deter- 
mined by comparing to other heteropterans. Analyses of 
sequences were performed with MEGA version 5.0 [54]. 

Taxon sampling 

In total, 19 taxa were sampled. These taxa included rep- 
resentatives of 10 out of 11 extant families of Nepomor- 
pha [46,55] and 9 outgroups (Table 3). Among them, the 
mt-genome data of Paraplea frontalis is reported here 
for the first time. To make the results more directly 
comparable to the study of Hua et al. [28], we retrieved 
all mt-genomes of 15 taxa (including nine ingroups and 
six outgroups) from their work. According to the ana- 
lysis of the heteropteran infraorders of Wheeler et al. 
[37], the phylogenetic relationships of Heteroptera are 
as follows: (Enicocephalomorpha + (Dipsocoromorpha + 
(Gerromorpha + (Nepomorpha + (Leptopodomorpha + 



Table 3 Taxonomy and GenBank accession numbers of mitochondrial genomes for species sampled in this study 



Suborder (bold) Infraorder (not bold) 


Superfamily 


Family 


Species 


Accession number 


Auchenorrhyncha 










Fulgoromorpha 


Fulgoroidea 


Fulgoridae 


Lycorma delicatula 


NC_012835 


Heteroptera 










Gerromorpha 


Hydrometroidea 


Hydrometridae 


Hydrometra greeni 


NC_0 12842 




Gerroidea 


Gerridae 


Aquarius paludum 


NC_012841 


Leptopodomorpha 


Saldoidea 


Saldidae 


Soldulo orsenjevi 


NC_0 12463 




Leptopodoidea 


Leptopodidae 


Leptopus sp. 


FJ456946 


Cimicomorpha 


Reduvioidea 


Reduviidae 


Triatoma dimidiata 


NC_002609 




Reduvioidea 


Reduviidae 


Volentio hoffmonni 


NC_0 12823 


Pentatomomorpha 


Lygaeoidea 


Berytidae 


Yemmalysus parallelus 


NC_0 12464 




Coreoidea 


Rhopalidae 


Stictopleurus subviridis 


NC_0 12888 


Nepomorpha 


Corixoidea 


Corixidae 


Sigoro septemlineoto 


FJ456941 




Nepoidea 


Belostomatidae 


Diplonychus rusticus 


FJ456940 






Nepidae 


Laccotrephes robustus 


FJ456948 




Ochteroidea 


Gelastocoridae 


Nerthro indico 


FJ456943 






Ochteridae 


Ochterus marginatus 


FJ456950 




Naucoroidea 


Naucoridae 


llyocoris cimicoides 


NC_0 12845 






Aphelocheiridae 


Aphelocheirus ellipsoideus 


FJ456939 




Notonectoidea 


Notonectidae 


Enithores tibialis 


NC_012819 




Pleoidea 


Helotrephidae 


Helotrephes sp. 


FJ456951 






Pleidae 


Paraplea frontalis 


KJ027516 
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(Cimicomorpha + Pentatomomorpha)))))). Therefore, 
we sampled another three taxa within the sister group 
to Nepomorpha as outgroups, with one representative 
from each of Leptopodomorpha, Cimicomorpha and 
Pentatomomorpha. 

Phylogenetic analyses 

All PCGs were aligned based on their amino acid se- 
quences using MUSCLE as implemented in the MEGA 
version 5.0 [54]. The rRNAs and tRNAs were aligned with 
CLUSTAL_X version 1.83 [56] under the default settings. 
The alignments of tRNA genes were corrected according 
to the secondary structures, especially the stem regions. 
The aligned nucleotide sequences, excluding stop codons, 
were then concatenated and used to reconstruct the phyl- 
ogeny. All phylogenetic trees were built using only first 
and second codon positions of 13 PCGs, except in our 
analyses in which we removed or added taxa to the data 
matrices of Hua et al. [28], so that we could make a direct 
comparison using methods used in the original paper. 
Our analyses with added and deleted taxa used the same 
data sampling methods of Hua et al. [28]; these analyses 
contained four kinds of data matrices: (1) The PCG123RT 
matrix, including all three codon positions of PCGs, rRNA 
genes, and tRNA genes; (2) the PCG12RT matrix, includ- 
ing the first and the second codon positions of PCGs, 
rRNA genes, and tRNA genes; (3) the PCG123 matrix, in- 
cluding all the three codon positions of PCGs; and (4) the 
PCG12 matrix, including the first and the second codon 
positions of PCGs. 

We used GPU MrBayes [57] for Bayesian inference 
and raxmlGUI 1.2 [58] for ML analyses to reconstruct 
phylogenetic trees. We used the GTR + I + T model, 
based on results from Modeltest Version 3.7 [59]. In 
Bayesian inference, two simultaneous runs of 10,000,000 
generations were conducted for each matrix. Each set 
was sampled every 100 generations. Trees that were 
sampled prior to stationarity (at 25% of the run) were 
discarded as burnin, and the remaining trees were used 
to construct a 50% majority-rule consensus tree. For the 
ML analysis, we conducted 1000 bootstrap replicates 
with thorough ML search. 

Tests of monophyly 

Traditionally recognized taxonomic groups are usually 
challenged when there is strong statistical support for an 
alternative phylogeny [16,60]. Likelihood-ratio tests [61] 
can provide a powerful means of examining alternatives. 
We applied likelihood-ratio tests to compare the support 
of various data sets for two different hypotheses (see 
Additional file 2): 

Hypothesis 1: Helotrephidae is nested within 
Nepomorpha (i.e., the true water bugs are 



monophyletic, and Helotrephidae is nested within the 
group). 

Hypothesis 2: Helotrephidae is outside of the remaining 
species of Nepomorpha (i.e., true water bugs are only 
monophyletic if Helotrephidae is excluded from the 
group). 

We conducted likelihood-ratio tests [61] of these two 
hypotheses for the original data set of Hua et al. [28], as 
well as with various additions and deletions of taxa, in- 
cluding both ingroups and outgroups. The likelihood- 
ratio tests were conducted using PAUP* 4 [62] . Heuristic 
searches were performed using the GTR + I + T model 
with 100 random addition replicates. 

Availability of supporting data 

The data sets supporting the results of this article 
are available in the Dryad repository, http://dx.doi.org/ 
10.5061/dryad.tf25c [63]. 

Additional files 



Additional file 1: Putative secondary structure of the 22 tRNAs 
identified in the mitochondrial genome of Paraplea frontalis. The 

tRNAs are labeled with the abbreviations of their corresponding amino 
acids. Dashes indicate Watson-Crick base pairing and asterisks indicate 
G-U base pairing. 

Additional file 2: Constraints for the two hypotheses used in the 
likelihood-ratio test regarding the monophyly of Nepomorpha. 

Additional file 3: Primers designed for Paraplea frontalis in this 
study. 

Additional file 4: Primers designed for Helotrephes semiglobosus 
semiglobosus in this study. 
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